This document presents the data processing and full analysis of syllable duration, intensity, and F0 across different languages (Kazakh, Russian, and Code-Switching) using R.
Linguistic Phenomenon: Kazakh-Russian intra-word code-switching: [[Russian Noun] + [Kazakh Noun suffix]]
Working RQ: How the stress patterns of two languages interact in word-internal shifts: whether the addition of Kazakh suffixes to Russian noun stems affects (shifts) the stress pattern (to the last syllable), consistent with Kazakh phonology.
Working predictions:
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Rows: 1976 Columns: 29
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (17): Filename, Word, Annotation, MaxF0Hz, MinF0Hz, MeanF0, Centre_MeanF...
## dbl (12): Word_beg, Word_end, Word_dur_ms, Begin, End, Duration_in_ms, Max_d...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Expected 4 pieces. Additional pieces discarded in 60 rows [19, 42, 54, 96, 97,
## 126, 129, 147, 167, 254, 268, 284, 296, 304, 314, 341, 376, 478, 507, 510,
## ...].
## Warning: There were 3 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `MeanF0 = as.numeric(MeanF0)`.
## Caused by warning:
## ! NAs introduced by coercion
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 2 remaining warnings.
## # A tibble: 1,976 × 43
## Filename Speaker Gender Word Word_beg Word_end Word_dur_ms SyllPos SyllStr
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr>
## 1 Speaker_1… Speake… male вече… 1.89 2.50 610. s1 cv
## 2 Speaker_1… Speake… male вече… 1.89 2.50 610. s2 cv
## 3 Speaker_1… Speake… male вече… 1.89 2.50 610. s3 cv
## 4 Speaker_1… Speake… male олжа… 8.48 9.16 677. s1 vc
## 5 Speaker_1… Speake… male олжа… 8.48 9.16 677. s2 cv
## 6 Speaker_1… Speake… male олжа… 8.48 9.16 677. s3 cvc
## 7 Speaker_1… Speake… male шеке… 13.5 14.2 642. s1 cv
## 8 Speaker_1… Speake… male шеке… 13.5 14.2 642. s2 cv
## 9 Speaker_1… Speake… male шеке… 13.5 14.2 642. s3 cvc
## 10 Speaker_1… Speake… male сөре… 18.8 19.4 628. s1 cv
## # ℹ 1,966 more rows
## # ℹ 34 more variables: SyllIPA <chr>, Stress <chr>, Begin <dbl>, End <dbl>,
## # Duration_in_ms <dbl>, Max_dB <dbl>, Min_dB <dbl>, Mean_dB <dbl>,
## # Centre_mean_dB <dbl>, MaxF0Hz <dbl>, MinF0Hz <dbl>, MeanF0 <dbl>,
## # Centre_MeanF0 <chr>, Language <chr>, SuffixCase <chr>, WordForm <chr>,
## # LatinScript <chr>, Gloss <chr>, WordClass <chr>, StressedSyll <dbl>,
## # NounGender <chr>, Declension <dbl>, StressShift <chr>, ShiftDirect <chr>, …
OBSERVATION:
OBSERVATION:
OBSERVATION:
OBSERVATION:
## Syllable Duration
## CS tokens: Syllable duration by stress
### CS tokens: Stress & WordForm interaction
### Compare s3 duration for Kazakh vs CS tokens
##
## Call:
## lm(formula = Duration_in_ms ~ SyllPos + Language, data = df_full_sample)
##
## Residuals:
## Min 1Q Median 3Q Max
## -190.22 -53.50 -6.03 45.21 606.30
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 216.307 3.340 64.754 < 2e-16 ***
## SyllPoss2 21.983 3.735 5.885 4.67e-09 ***
## SyllPoss3 15.095 4.667 3.234 0.00124 **
## LanguageRus 26.791 4.716 5.681 1.54e-08 ***
## LanguageCS 11.259 3.721 3.026 0.00251 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 74.26 on 1949 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.03366, Adjusted R-squared: 0.03168
## F-statistic: 16.97 on 4 and 1949 DF, p-value: 1.088e-13
OBSERVATION:
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
##
## Attaching package: 'lmerTest'
## The following object is masked from 'package:lme4':
##
## lmer
## The following object is masked from 'package:stats':
##
## step
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_col()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_col()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_col()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_col()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_col()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_col()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_col()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_col()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_col()`).
roots_kaz <- kz_all_syll %>%
filter(WordForm == "uninflected")
suffixed_kaz <- kz_all_syll %>%
filter(WordForm == "inflected")
model_roots_dur <- lmer(Duration_in_ms ~ SyllPos + (1 | Speaker) + (1 | Word), data = roots_kaz)
summary(model_roots_dur)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Duration_in_ms ~ SyllPos + (1 | Speaker) + (1 | Word)
## Data: roots_kaz
##
## REML criterion at convergence: 3535.1
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.75448 -0.68442 -0.08159 0.60448 3.09951
##
## Random effects:
## Groups Name Variance Std.Dev.
## Word (Intercept) 276.2 16.62
## Speaker (Intercept) 522.0 22.85
## Residual 3508.5 59.23
## Number of obs: 320, groups: Word, 40; Speaker, 4
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 226.529 12.623 3.801 17.946 8.15e-05 ***
## SyllPoss2 52.619 6.622 276.000 7.946 4.94e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## SyllPoss2 -0.262
model_suffixed_dur <- lmer(Duration_in_ms ~ SyllPos + (1 | Speaker) + (1 | Word), data = suffixed_kaz)
summary(model_suffixed_dur)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Duration_in_ms ~ SyllPos + (1 | Speaker) + (1 | Word)
## Data: suffixed_kaz
##
## REML criterion at convergence: 5155.8
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.7199 -0.6445 -0.0443 0.6274 3.5197
##
## Random effects:
## Groups Name Variance Std.Dev.
## Word (Intercept) 113.1 10.63
## Speaker (Intercept) 371.2 19.27
## Residual 3068.7 55.40
## Number of obs: 474, groups: Word, 40; Speaker, 4
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 195.9828 10.7268 4.0289 18.270 5.01e-05 ***
## SyllPoss2 -0.4993 6.2142 429.0131 -0.080 0.936
## SyllPoss3 47.6350 6.2560 430.5744 7.614 1.69e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) SyllP2
## SyllPoss2 -0.292
## SyllPoss3 -0.290 0.500
model_roots_int <- lmer(Mean_dB ~ SyllPos + (1 | Speaker) + (1 | Word), data = roots_kaz)
summary(model_roots_int)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Mean_dB ~ SyllPos + (1 | Speaker) + (1 | Word)
## Data: roots_kaz
##
## REML criterion at convergence: 1738.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.7280 -0.6694 0.1156 0.6561 2.3317
##
## Random effects:
## Groups Name Variance Std.Dev.
## Word (Intercept) 3.367 1.835
## Speaker (Intercept) 18.763 4.332
## Residual 11.011 3.318
## Number of obs: 320, groups: Word, 40; Speaker, 4
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 61.8897 2.2008 3.1528 28.121 6.83e-05 ***
## SyllPoss2 -0.1264 0.3710 276.0001 -0.341 0.734
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## SyllPoss2 -0.084
model_suffixed_int <- lmer(Mean_dB ~ SyllPos + (1 | Speaker) + (1 | Word), data = suffixed_kaz)
summary(model_suffixed_int)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Mean_dB ~ SyllPos + (1 | Speaker) + (1 | Word)
## Data: suffixed_kaz
##
## REML criterion at convergence: 2566.9
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.2422 -0.6139 0.1288 0.6423 2.5559
##
## Random effects:
## Groups Name Variance Std.Dev.
## Word (Intercept) 5.631 2.373
## Speaker (Intercept) 3.677 1.918
## Residual 10.957 3.310
## Number of obs: 474, groups: Word, 40; Speaker, 4
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 68.3941 1.0628 4.2990 64.353 1.4e-07 ***
## SyllPoss2 1.0379 0.3714 429.0521 2.794 0.00544 **
## SyllPoss3 0.2492 0.3743 429.4918 0.666 0.50594
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) SyllP2
## SyllPoss2 -0.176
## SyllPoss3 -0.175 0.499
model_roots_f0 <- lmer(MeanF0 ~ SyllPos + (1 | Speaker) + (1 | Word), data = roots_kaz)
summary(model_roots_f0)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: MeanF0 ~ SyllPos + (1 | Speaker) + (1 | Word)
## Data: roots_kaz
##
## REML criterion at convergence: 2611
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.9693 -0.6668 0.0422 0.5273 3.2964
##
## Random effects:
## Groups Name Variance Std.Dev.
## Word (Intercept) 4.152 2.038
## Speaker (Intercept) 4772.994 69.087
## Residual 200.816 14.171
## Number of obs: 318, groups: Word, 40; Speaker, 4
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 162.760 34.563 3.004 4.709 0.0181 *
## SyllPoss2 -14.998 1.590 273.970 -9.435 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## SyllPoss2 -0.023
model_suffixed_f0 <- lmer(MeanF0 ~ SyllPos + (1 | Speaker) + (1 | Word), data = suffixed_kaz)
summary(model_suffixed_f0)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: MeanF0 ~ SyllPos + (1 | Speaker) + (1 | Word)
## Data: suffixed_kaz
##
## REML criterion at convergence: 3634.2
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.7271 -0.4804 -0.0797 0.5228 2.7992
##
## Random effects:
## Groups Name Variance Std.Dev.
## Word (Intercept) 13.77 3.711
## Speaker (Intercept) 4511.56 67.168
## Residual 133.27 11.544
## Number of obs: 464, groups: Word, 40; Speaker, 4
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 157.017 33.602 3.005 4.673 0.0184 *
## SyllPoss2 -1.396 1.320 423.975 -1.058 0.2907
## SyllPoss3 11.264 1.322 424.756 8.518 2.84e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) SyllP2
## SyllPoss2 -0.020
## SyllPoss3 -0.020 0.510
OBSERVATION: Kazakh Stress and its Correlates
| Dataset | Acoustic Measure | SyllPos Effect | Estimate | t-value | p-value | Significance |
|---|---|---|---|---|---|---|
| Roots | Duration | s2 | +52.62 | 7.95 | < 0.001 | *** |
| Mean_dB | s2 | -0.13 | -0.34 | 0.734 | n.s. | |
| MeanF0 | s2 | -15.00 | -9.44 | < 0.001 | *** | |
| Suffixed | Duration | s2 | -0.50 | -0.08 | 0.936 | n.s. |
| Duration | s3 | +47.64 | 7.61 | < 0.001 | *** | |
| Mean_dB | s2 | +1.04 | 2.79 | 0.005 | ** | |
| Mean_dB | s3 | +0.25 | 0.67 | 0.506 | n.s. | |
| MeanF0 | s2 | -1.40 | -1.06 | 0.291 | n.s. | |
| MeanF0 | s3 | +11.26 | 8.52 | < 0.001 | *** |
In uninflected roots, stress appears to fall on the second syllable as shown by increased duration and lower pitch. Does pitch play an edge marking role as in Uyghur?
In inflected forms, stress appears to shift to the suffix, reflected in longer duration and elevated F0 (Why?) in the final syllable (s3).
Intensity (Mean_dB) is not a consistent cue across word types and positions, aligning with previous findings that duration is more robust stress correlate in Kazakh and the role of pitch needs to be re-assessed.
# df_rus
# Filter Russian syllables with valid Stress
rus_all_syll <- df_full_sample %>%
filter(Language == "Rus", !is.na(Stress))
# Summarize by stress and word form
summary_rus_all <- rus_all_syll %>%
group_by(Stress, WordForm) %>%
summarise(
mean_dur = mean(Duration_in_ms, na.rm = TRUE),
sd_dur = sd(Duration_in_ms, na.rm = TRUE),
mean_dB = mean(Mean_dB, na.rm = TRUE),
sd_dB = sd(Mean_dB, na.rm = TRUE),
mean_f0 = mean(MeanF0, na.rm = TRUE),
sd_f0 = sd(MeanF0, na.rm = TRUE),
n = n(),
.groups = "drop"
) %>%
mutate(
se_dur = sd_dur / sqrt(n),
se_dB = sd_dB / sqrt(n),
se_f0 = sd_f0 / sqrt(n)
)
# Prevent NA level from sneaking in
summary_rus_all_complete <- summary_rus_all %>%
mutate(Stress = as.character(Stress)) %>%
complete(Stress, WordForm, fill = list(
mean_dur = NA,
se_dur = NA,
mean_dB = NA,
se_dB = NA,
mean_f0 = NA,
se_f0 = NA
)) %>%
filter(!is.na(Stress)) %>%
mutate(Stress = factor(Stress, levels = c("stressed", "unstressed"))) # Explicit order
# Duration Plot
rus_dur <- ggplot(summary_rus_all_complete, aes(x = Stress, y = mean_dur, fill = WordForm)) +
geom_col(position = position_dodge(width = 0.8), width = 0.7) +
geom_errorbar(
aes(ymin = mean_dur - se_dur, ymax = mean_dur + se_dur),
position = position_dodge(width = 0.8),
width = 0.2
) +
labs(
x = "Stressed Syllable",
y = "Mean Duration (ms)",
fill = "Word Form"
) +
theme_minimal(base_size = 14) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggsave("rus_duration_plot.png", plot = rus_dur, width = 8, height = 6, dpi = 300, bg = "white")
# Intensity Plot
rus_db <- ggplot(summary_rus_all_complete, aes(x = Stress, y = mean_dB, fill = WordForm)) +
geom_col(position = position_dodge(width = 0.8), width = 0.7) +
geom_errorbar(
aes(ymin = mean_dB - se_dB, ymax = mean_dB + se_dB),
position = position_dodge(width = 0.8),
width = 0.2
) +
labs(
x = "Stressed Syllable",
y = "Mean Intensity (dB)",
fill = "Word Form"
) +
theme_minimal(base_size = 14) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggsave("rus_intensity_plot.png", plot = rus_db, width = 8, height = 6, dpi = 300, bg = "white")
# F0 Plot
rus_f0 <- ggplot(summary_rus_all_complete, aes(x = Stress, y = mean_f0, fill = WordForm)) +
geom_col(position = position_dodge(width = 0.8), width = 0.7) +
geom_errorbar(
aes(ymin = mean_f0 - se_f0, ymax = mean_f0 + se_f0),
position = position_dodge(width = 0.8),
width = 0.2
) +
labs(
x = "Stressed Syllable",
y = "Mean F0 (Hz)",
fill = "Word Form"
) +
theme_minimal(base_size = 14) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggsave("rus_f0_plot.png", plot = rus_f0, width = 8, height = 6, dpi = 300, bg = "white")
# Horizontal panel with labels A, B, C
rus_dur_clean <- rus_dur + theme(axis.title.x = element_blank())
rus_f0_clean <- rus_f0 + theme(axis.title.x = element_blank())
rus_panel_horizontal <- rus_dur_clean + rus_db + rus_f0_clean +
plot_layout(ncol = 3, guides = "collect") +
plot_annotation(tag_levels = 'A')
ggsave("rus_panel_horizontal.png", plot = rus_panel_horizontal, width = 12, height = 5, dpi = 300, bg = "white")
rus_panel_horizontal
# Filter Russian tokens with valid syllable position
# Filter valid Russian tokens with syllable position and stress info
rus_all_syll_posstress <- df_full_sample %>%
filter(Language == "Rus", !is.na(SyllPos), !is.na(Stress))
# Summarize by syllable position and stress
summary_rus_posstress <- rus_all_syll_posstress %>%
group_by(SyllPos, Stress) %>%
summarise(
mean_dur = mean(Duration_in_ms, na.rm = TRUE),
sd_dur = sd(Duration_in_ms, na.rm = TRUE),
mean_dB = mean(Mean_dB, na.rm = TRUE),
sd_dB = sd(Mean_dB, na.rm = TRUE),
mean_f0 = mean(MeanF0, na.rm = TRUE),
sd_f0 = sd(MeanF0, na.rm = TRUE),
n = n(),
.groups = "drop"
) %>%
mutate(
se_dur = sd_dur / sqrt(n),
se_dB = sd_dB / sqrt(n),
se_f0 = sd_f0 / sqrt(n)
)
# Ensure complete combinations and order factors
summary_rus_posstress_complete <- summary_rus_posstress %>%
complete(SyllPos, Stress, fill = list(
mean_dur = NA, se_dur = NA,
mean_dB = NA, se_dB = NA,
mean_f0 = NA, se_f0 = NA
)) %>%
filter(!is.na(SyllPos) & !is.na(Stress)) %>%
mutate(
SyllPos = factor(SyllPos, levels = c("s1", "s2", "s3")),
Stress = factor(Stress, levels = c("stressed", "unstressed"))
)
# Duration plot
rusps_dur <- ggplot(summary_rus_posstress_complete, aes(x = SyllPos, y = mean_dur, fill = Stress)) +
geom_col(position = position_dodge(width = 0.8), width = 0.7) +
geom_errorbar(aes(ymin = mean_dur - se_dur, ymax = mean_dur + se_dur),
position = position_dodge(width = 0.8), width = 0.2) +
labs(x = "Syllable Position", y = "Mean Duration (ms)", fill = "Stress") +
theme_minimal(base_size = 14)
ggsave("rusps_duration_plot.png", plot = rusps_dur, width = 8, height = 6, dpi = 300, bg = "white")
## Warning: Removed 3 rows containing missing values or values outside the scale range
## (`geom_col()`).
# Intensity plot
rusps_db <- ggplot(summary_rus_posstress_complete, aes(x = SyllPos, y = mean_dB, fill = Stress)) +
geom_col(position = position_dodge(width = 0.8), width = 0.7) +
geom_errorbar(aes(ymin = mean_dB - se_dB, ymax = mean_dB + se_dB),
position = position_dodge(width = 0.8), width = 0.2) +
labs(x = "Syllable Position", y = "Mean Intensity (dB)", fill = "Stress") +
theme_minimal(base_size = 14)
ggsave("rusps_intensity_plot.png", plot = rusps_db, width = 8, height = 6, dpi = 300, bg = "white")
## Warning: Removed 3 rows containing missing values or values outside the scale range
## (`geom_col()`).
# F0 plot
rusps_f0 <- ggplot(summary_rus_posstress_complete, aes(x = SyllPos, y = mean_f0, fill = Stress)) +
geom_col(position = position_dodge(width = 0.8), width = 0.7) +
geom_errorbar(aes(ymin = mean_f0 - se_f0, ymax = mean_f0 + se_f0),
position = position_dodge(width = 0.8), width = 0.2) +
labs(x = "Syllable Position", y = "Mean F0 (Hz)", fill = "Stress") +
theme_minimal(base_size = 14)
ggsave("rusps_f0_plot.png", plot = rusps_f0, width = 8, height = 6, dpi = 300, bg = "white")
## Warning: Removed 3 rows containing missing values or values outside the scale range
## (`geom_col()`).
# Combine into horizontal panel
rusps_dur_clean <- rusps_dur + theme(axis.title.x = element_blank())
rusps_f0_clean <- rusps_f0 + theme(axis.title.x = element_blank())
rusps_panel <- rusps_dur_clean + rusps_db + rusps_f0_clean +
plot_layout(ncol = 3, guides = "collect") +
plot_annotation(tag_levels = 'A')
ggsave("rusps_panel_horizontal.png", plot = rusps_panel, width = 12, height = 5, dpi = 300, bg = "white")
## Warning: Removed 3 rows containing missing values or values outside the scale range
## (`geom_col()`).
## Removed 3 rows containing missing values or values outside the scale range
## (`geom_col()`).
## Removed 3 rows containing missing values or values outside the scale range
## (`geom_col()`).
rusps_panel
## Warning: Removed 3 rows containing missing values or values outside the scale range
## (`geom_col()`).
## Removed 3 rows containing missing values or values outside the scale range
## (`geom_col()`).
## Removed 3 rows containing missing values or values outside the scale range
## (`geom_col()`).
## Russian: Stress Correlates
model_rus_dur <- lmer(Duration_in_ms ~ Stress + (1 | Speaker) + (1 | Word), data = rus_all_syll)
summary(model_rus_dur)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Duration_in_ms ~ Stress + (1 | Speaker) + (1 | Word)
## Data: rus_all_syll
##
## REML criterion at convergence: 4149.6
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.5322 -0.6343 -0.0307 0.5576 6.9722
##
## Random effects:
## Groups Name Variance Std.Dev.
## Word (Intercept) 1287 35.88
## Speaker (Intercept) 1168 34.17
## Residual 5108 71.47
## Number of obs: 361, groups: Word, 41; Speaker, 4
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 279.552 18.899 4.092 14.792 0.000105 ***
## Stressunstressed -34.179 7.690 322.118 -4.445 1.21e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Strssnstrss -0.229
model_rus_int <- lmer(Mean_dB ~ Stress + (1 | Speaker) + (1 | Word), data = rus_all_syll)
summary(model_rus_int)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Mean_dB ~ Stress + (1 | Speaker) + (1 | Word)
## Data: rus_all_syll
##
## REML criterion at convergence: 1975
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.64402 -0.60711 0.02214 0.57733 2.98646
##
## Random effects:
## Groups Name Variance Std.Dev.
## Word (Intercept) 2.501 1.582
## Speaker (Intercept) 2.576 1.605
## Residual 12.141 3.484
## Number of obs: 361, groups: Word, 41; Speaker, 4
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 70.4119 0.8863 4.0326 79.449 1.35e-07 ***
## Stressunstressed -1.9720 0.3747 323.6696 -5.263 2.58e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Strssnstrss -0.238
model_rus_f0 <- lmer(MeanF0 ~ Stress + (1 | Speaker) + (1 | Word), data = rus_all_syll)
summary(model_rus_f0)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: MeanF0 ~ Stress + (1 | Speaker) + (1 | Word)
## Data: rus_all_syll
##
## REML criterion at convergence: 3224
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.8145 -0.6282 -0.0776 0.5820 3.1984
##
## Random effects:
## Groups Name Variance Std.Dev.
## Word (Intercept) 9.298 3.049
## Speaker (Intercept) 2879.566 53.662
## Residual 430.933 20.759
## Number of obs: 360, groups: Word, 41; Speaker, 4
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 163.009 26.888 3.015 6.063 0.008874 **
## Stressunstressed -8.390 2.219 332.885 -3.782 0.000185 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Strssnstrss -0.047
OBSERVATION: Russian Stress
Stress Effects in Russian Words
| Acoustic Measure | Predictor | Estimate | t-value | p-value | Significance |
|---|---|---|---|---|---|
| Duration | Stress (unstressed) | –34.18 | –4.45 | 1.21e–05 | *** |
| Intensity | Stress (unstressed) | –1.97 | –5.26 | 2.58e–07 | *** |
| F0 | Stress (unstressed) | –8.39 | –3.78 | 0.00019 | *** |
The linear mixed effects models indicate that lexical stress in Russian significantly influences all three acoustic correlates—duration, intensity, and fundamental frequency (F0).
Duration: Unstressed syllables are, on average, 34.18 ms shorter than stressed ones (p < 0.001), highlighting duration as a robust cue to stress.
Intensity (Mean dB): Unstressed syllables are 1.97 dB quieter, also statistically significant (p < 0.001), suggesting that loudness is another reliable correlate.
F0: Unstressed syllables have a significantly lower pitch, 8.39 Hz lower than stressed syllables (p < 0.001), consistent with the expectation that pitch rises under stress.
Together, these results show that duration, intensity, and pitch all significantly differentiate stressed from unstressed syllables in Russian. This supports prior findings that Russian exhibits strong acoustic marking of stress across multiple phonetic dimensions compared to Kazakh. However, these results should be taken by a grain of salt since the participants are not native speakers of Russian despite a high bilingual proficiency.
What this code snippet does: (1) Creates a subset of df for CS and CS&Kazakh tokens. (2) Plots hypotheses A,B,C,D. (3) Runs an lmer() model on the created subsets to check Hs.
# Filter for CS tokens with SyllPos s1 or s2
view(df_full_sample)
# Filter CS words with s1/s2
cs_tokens <- df_full_sample %>%
filter(Language == "CS", SyllPos %in% c("s1", "s2")) %>%
group_by(Filename, Word) %>%
filter(all(c("s1", "s2") %in% SyllPos)) %>%
ungroup()
# Keep only tokens that appear once per SyllPos (no duplicate s1/s2)
cs_tokens <- cs_tokens %>%
group_by(Filename, Word, SyllPos) %>%
filter(n() == 1) %>%
ungroup()
# Keep these columns untouched
id_cols <- c("Filename", "Speaker", "Gender", "Word", "Word_beg", "Word_end", "Word_dur_ms",
"Language", "SuffixCase", "WordForm", "LatinScript", "Gloss", "WordClass",
"StressedSyll", "Declension", "NounGender", "StressShift", "ShiftDirect", "AttestedInCS")
# Pivot all columns except for id_cols&SyllPos
pivot_cols <- cs_tokens %>%
select(-all_of(c(id_cols, "SyllPos"))) %>%
names()
# Pivot wider
cs_df_wide <- cs_tokens %>%
pivot_wider(
id_cols = all_of(id_cols),
names_from = SyllPos,
values_from = all_of(pivot_cols),
names_sep = "_"
)
# Filter rows where s1 and s2 data are present
cs_df_wide <- cs_df_wide %>%
filter(
!is.na(Duration_in_ms_s1) & !is.na(Duration_in_ms_s2),
!is.na(Mean_dB_s1) & !is.na(Mean_dB_s2),
!is.na(Max_dB_s1) & !is.na(Max_dB_s2),
!is.na(MeanF0_s1) & !is.na(MeanF0_s2),
!is.na(MaxF0Hz_s1) & !is.na(MaxF0Hz_s2)
)
# Compute ratios
cs_df_wide <- cs_df_wide %>%
mutate(
ratio_s1_s2_dur = Duration_in_ms_s1 / Duration_in_ms_s2,
ratio_mean_int = Mean_dB_s1 / Mean_dB_s2,
ratio_max_int = Max_dB_s1 / Max_dB_s2,
ratio_mean_f0 = MeanF0_s1 / MeanF0_s2,
ratio_max_fo = MaxF0Hz_s1 / MaxF0Hz_s2,
root = Word,
root_stress = StressedSyll
)
#view(cs_tokens)
view(cs_df_wide)
# Create the final dataset
cs_roots <- cs_df_wide %>%
select(Speaker, root, root_stress, WordForm, StressShift, ShiftDirect, ratio_s1_s2_dur, ratio_mean_int, ratio_max_int, ratio_mean_f0, ratio_max_fo)
view(cs_roots)
# Summarize the data
summary_df <- cs_roots %>%
group_by(root_stress, WordForm) %>%
summarise(
mean_ratio_dur = mean(ratio_s1_s2_dur, na.rm = TRUE),
sd_ratio_dur = sd(ratio_s1_s2_dur, na.rm = TRUE),
mean_ratio_int = mean(ratio_mean_int, na.rm = TRUE),
sd_ratio_int = sd(ratio_mean_int, na.rm = TRUE),
mean_ratio_max_int = mean(ratio_max_int, na.rm = TRUE),
sd_ratio_max_int = sd(ratio_max_int, na.rm = TRUE),
mean_ratio_f0 = mean(ratio_mean_f0, na.rm = TRUE),
sd_ratio_f0 = sd(ratio_mean_f0, na.rm = TRUE),
mean_ratio_max_fo = mean(ratio_max_fo, na.rm = TRUE),
sd_ratio_max_fo = sd(ratio_max_fo, na.rm = TRUE),
n = n(),
.groups = "drop"
) %>%
mutate(
se_ratio_dur = sd_ratio_dur / sqrt(n),
se_ratio_int = sd_ratio_int / sqrt(n),
se_ratio_max_int = sd_ratio_max_int / sqrt(n),
se_ratio_f0 = sd_ratio_f0 / sqrt(n),
se_ratio_max_fo = sd_ratio_max_fo / sqrt(n)
)
view(summary_df)
# Plot 1: Duration Ratio
cs_ratio_dur <- ggplot(summary_df, aes(x = factor(root_stress), y = mean_ratio_dur, color = WordForm, shape = WordForm)) +
geom_hline(yintercept = 1, linetype = "dashed", color = "gray50") +
geom_point(position = position_dodge(width = 0.4), size = 7) +
geom_errorbar(
aes(ymin = mean_ratio_dur - se_ratio_dur, ymax = mean_ratio_dur + se_ratio_dur),
position = position_dodge(width = 0.4),
width = 0.2
) +
ylim(.7, 1.3) +
labs(
x = "Root Stress Position",
y = "Mean Duration Ratio (s1:s2)",
color = "Word Form",
shape = "Word Form"
) +
theme_minimal(base_size = 18) +
theme(
axis.title = element_text(size = 18),
axis.text = element_text(size = 16),
legend.title = element_text(size = 16),
legend.text = element_text(size = 14)
)
#print(cs_ratio_dur)
ggsave("cs_ratio_duration.png", plot = cs_ratio_dur, width = 6, height = 4, dpi = 300, bg = "white")
# Plot 2: Mean Intensity Ratio
cs_ratio_mean_int <- ggplot(summary_df, aes(x = factor(root_stress), y = mean_ratio_int, color = WordForm, shape = WordForm)) +
geom_hline(yintercept = 1, linetype = "dashed", color = "gray50") +
geom_point(position = position_dodge(width = 0.4), size = 7) +
geom_errorbar(
aes(ymin = mean_ratio_int - se_ratio_int, ymax = mean_ratio_int + se_ratio_int),
position = position_dodge(width = 0.4),
width = 0.2
) +
ylim(.7, 1.3) +
labs(
x = "Root Stress Position",
y = "Mean Intensity Ratio (s1:s2)",
color = "Word Form",
shape = "Word Form"
) +
#scale_color_manual(values = c("uninflected" = "#2ca02c", "inflected" = "#9467bd")) +
#scale_shape_manual(values = c("uninflected" = 15, "inflected" = 18)) +
theme_minimal(base_size = 18) +
theme(
axis.title = element_text(size = 18),
axis.text = element_text(size = 16),
legend.title = element_text(size = 16),
legend.text = element_text(size = 14)
)
#print(cs_ratio_mean_int)
ggsave("cs_ratio_mean_intensity.png", plot = cs_ratio_mean_int, width = 6, height = 4, dpi = 300, bg = "white")
# Plot 3: Max Intensity Ratio
# cs_ratio_max_int <- ggplot(summary_df, aes(x = factor(root_stress), y = mean_ratio_max_int, color = WordForm, shape = WordForm)) +
# geom_hline(yintercept = 1, linetype = "dashed", color = "gray50") +
# geom_point(position = position_dodge(width = 0.4), size = 7) +
# geom_errorbar(
# aes(ymin = mean_ratio_max_int - se_ratio_max_int, ymax = mean_ratio_max_int + se_ratio_max_int),
# position = position_dodge(width = 0.4),
# width = 0.2
# ) +
# ylim(.7, 1.3) +
# labs(
# x = "Root Stress Position",
# y = "Max Intensity Ratio (s1:s2)",
# color = "Word Form",
# shape = "Word Form"
# ) +
# theme_minimal(base_size = 18) +
# theme(
# axis.title = element_text(size = 18),
# axis.text = element_text(size = 16),
# legend.title = element_text(size = 16),
# legend.text = element_text(size = 14)
# )
#
# print(cs_ratio_max_int)
# ggsave("cs_ratio_max_intensity.png", plot = cs_ratio_max_int, width = 6, height = 4, dpi = 300)
# Plot 4: Mean F0 Ratio
cs_ratio_mean_f0 <- ggplot(summary_df, aes(x = factor(root_stress), y = mean_ratio_f0, color = WordForm, shape = WordForm)) +
geom_hline(yintercept = 1, linetype = "dashed", color = "gray50") +
geom_point(position = position_dodge(width = 0.4), size = 7) +
geom_errorbar(
aes(ymin = mean_ratio_f0 - se_ratio_f0, ymax = mean_ratio_f0 + se_ratio_f0),
position = position_dodge(width = 0.4),
width = 0.2
) +
ylim(.7, 1.3) +
labs(
x = "Root Stress Position",
y = "Mean F0 Ratio (s1:s2)",
color = "Word Form",
shape = "Word Form"
) +
#scale_color_manual(values = c("uninflected" = "#ff7f0e", "inflected" = "#e377c2")) +
#scale_shape_manual(values = c("uninflected" = 8, "inflected" = 4)) +
theme_minimal(base_size = 18) +
theme(
axis.title = element_text(size = 18),
axis.text = element_text(size = 16),
legend.title = element_text(size = 16),
legend.text = element_text(size = 14)
)
#print(cs_ratio_mean_f0)
ggsave("cs_ratio_mean_f0.png", plot = cs_ratio_mean_f0, width = 6, height = 4, dpi = 300, bg = "white")
# Plot 5: Max F0 Ratio
# cs_ratio_max_f0 <- ggplot(summary_df, aes(x = factor(root_stress), y = mean_ratio_max_fo, color = WordForm, shape = WordForm)) +
# geom_hline(yintercept = 1, linetype = "dashed", color = "gray50") +
# geom_point(position = position_dodge(width = 0.4), size = 7) +
# geom_errorbar(
# aes(ymin = mean_ratio_max_fo - se_ratio_max_fo, ymax = mean_ratio_max_fo + se_ratio_max_fo),
# position = position_dodge(width = 0.4),
# width = 0.2
# ) +
# ylim(.7, 1.3) +
# labs(
# x = "Root Stress Position",
# y = "Max F0 Ratio (s1:s2)",
# color = "Word Form",
# shape = "Word Form"
# ) +
# theme_minimal(base_size = 18) +
# theme(
# axis.title = element_text(size = 18),
# axis.text = element_text(size = 16),
# legend.title = element_text(size = 16),
# legend.text = element_text(size = 14)
# )
#
# print(cs_ratio_max_f0)
# ggsave("cs_ratio_max_f0.png", plot = cs_ratio_max_f0, width = 6, height = 4, dpi = 300)
# Combine into horizontal panel
cs_ratio_dur_clean <- cs_ratio_dur + theme(axis.title.x = element_blank())
cs_ratio_mean_f0_clean <- cs_ratio_mean_f0 + theme(axis.title.x = element_blank())
cs_panel_plotA <- cs_ratio_dur_clean + cs_ratio_mean_int + cs_ratio_mean_f0_clean +
plot_layout(ncol = 3, guides = "collect") +
plot_annotation(tag_levels = 'A')
ggsave("plotA_panel_horizontal.png", plot = cs_panel_plotA, width = 12, height = 5, dpi = 300, bg = "white")
cs_panel_plotA
### Run lmer() on the subset of dataset
# Test H1: Stress remains fixed on the root
head(cs_roots)
# The reference level 'uninflected' word forms and against which 'inflected' will be compared in the #. # output.
# Convert wordform to a factor since there are two cat levels
cs_roots$WordForm <- factor(cs_roots$WordForm)
# Set ref level
cs_roots$WordForm <- relevel(cs_roots$WordForm, ref = "uninflected")
model_a_dur <- lmer(ratio_s1_s2_dur ~ WordForm*factor(root_stress) + (1|root) +(1|Speaker), data=cs_roots)
summary(model_a_dur)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: ratio_s1_s2_dur ~ WordForm * factor(root_stress) + (1 | root) +
## (1 | Speaker)
## Data: cs_roots
##
## REML criterion at convergence: 291.1
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.7190 -0.5276 -0.1247 0.4397 3.3426
##
## Random effects:
## Groups Name Variance Std.Dev.
## root (Intercept) 0.172647 0.41551
## Speaker (Intercept) 0.004828 0.06949
## Residual 0.083040 0.28817
## Number of obs: 302, groups: root, 80; Speaker, 4
##
## Fixed effects:
## Estimate Std. Error df t value
## (Intercept) 1.09456 0.10508 60.18736 10.417
## WordForminflected 0.09193 0.14009 74.39859 0.656
## factor(root_stress)2 -0.28313 0.14009 74.56604 -2.021
## WordForminflected:factor(root_stress)2 0.09705 0.19775 73.96366 0.491
## Pr(>|t|)
## (Intercept) 4.29e-15 ***
## WordForminflected 0.5137
## factor(root_stress)2 0.0469 *
## WordForminflected:factor(root_stress)2 0.6250
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) WrdFrm fc(_)2
## WrdFrmnflct -0.668
## fctr(rt_s)2 -0.668 0.501
## WrdFrm:(_)2 0.473 -0.708 -0.708
model_a_int <- lmer(ratio_mean_int ~ WordForm*factor(root_stress) + (1|root) +(1|Speaker), data=cs_roots)
summary(model_a_int)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: ratio_mean_int ~ WordForm * factor(root_stress) + (1 | root) +
## (1 | Speaker)
## Data: cs_roots
##
## REML criterion at convergence: -693
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.32995 -0.56131 -0.02718 0.60862 2.70449
##
## Random effects:
## Groups Name Variance Std.Dev.
## root (Intercept) 0.004257 0.06525
## Speaker (Intercept) 0.000353 0.01879
## Residual 0.003394 0.05826
## Number of obs: 302, groups: root, 80; Speaker, 4
##
## Fixed effects:
## Estimate Std. Error df t value
## (Intercept) 1.07401 0.01871 29.29282 57.403
## WordForminflected -0.05681 0.02284 76.24030 -2.488
## factor(root_stress)2 -0.06925 0.02284 76.67589 -3.032
## WordForminflected:factor(root_stress)2 0.04370 0.03221 75.69390 1.357
## Pr(>|t|)
## (Intercept) < 2e-16 ***
## WordForminflected 0.01504 *
## factor(root_stress)2 0.00332 **
## WordForminflected:factor(root_stress)2 0.17890
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) WrdFrm fc(_)2
## WrdFrmnflct -0.613
## fctr(rt_s)2 -0.613 0.502
## WrdFrm:(_)2 0.434 -0.709 -0.709
model_a_f0 <- lmer(ratio_mean_f0 ~ WordForm*factor(root_stress) + (1|root) +(1|Speaker), data=cs_roots)
## boundary (singular) fit: see help('isSingular')
summary(model_a_f0)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: ratio_mean_f0 ~ WordForm * factor(root_stress) + (1 | root) +
## (1 | Speaker)
## Data: cs_roots
##
## REML criterion at convergence: -467.7
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.7942 -0.5665 0.0018 0.5803 3.1138
##
## Random effects:
## Groups Name Variance Std.Dev.
## root (Intercept) 0.000000 0.00000
## Speaker (Intercept) 0.001401 0.03743
## Residual 0.011232 0.10598
## Number of obs: 302, groups: root, 80; Speaker, 4
##
## Fixed effects:
## Estimate Std. Error df t value
## (Intercept) 1.08789 0.02250 5.10420 48.345
## WordForminflected -0.03075 0.01738 295.00693 -1.770
## factor(root_stress)2 -0.04211 0.01756 295.07875 -2.399
## WordForminflected:factor(root_stress)2 -0.02429 0.02442 295.05858 -0.995
## Pr(>|t|)
## (Intercept) 5.42e-08 ***
## WordForminflected 0.0778 .
## factor(root_stress)2 0.0171 *
## WordForminflected:factor(root_stress)2 0.3207
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) WrdFrm fc(_)2
## WrdFrmnflct -0.399
## fctr(rt_s)2 -0.395 0.512
## WrdFrm:(_)2 0.284 -0.712 -0.719
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
OBSERVATION:Model_A
Model A: Duration Ratio
Formula: ratio_s1_s2_dur ~ WordForm * factor(root_stress) + (1|root) + (1|Speaker)
| Term | Estimate | p-value | Interpretation |
|---|---|---|---|
| (Intercept) | 1.095 | < .001 | Baseline duration ratio for uninflected words with stress on s1 |
WordForminflected |
0.092 | .514 | Inflected words show slightly higher s1:s2 duration ratio (NS) |
factor(root_stress)2 |
–0.283 | .047 | Stress on s2 results in lower s1:s2 duration ratio (s1 becomes shorter) |
WordForminflected:factor(root_stress)2 |
0.097 | .625 | No significant interaction effect |
Model A: Mean Intensity Ratio
Formula: ratio_mean_int ~ WordForm * factor(root_stress) + (1|root) + (1|Speaker)
| Term | Estimate | p-value | Interpretation |
|---|---|---|---|
| (Intercept) | 1.074 | < .001 | Baseline intensity ratio for uninflected words with stress on s1 |
WordForminflected |
–0.057 | .015 | Inflected words show significantly lower s1:s2 intensity |
factor(root_stress)2 |
–0.069 | .003 | Stress on s2 lowers the s1:s2 intensity ratio |
WordForminflected:factor(root_stress)2 |
0.044 | .179 | No significant interaction |
Model A: Mean F0 Ratio
Formula: ratio_mean_f0 ~ WordForm * factor(root_stress) + (1|root) + (1|Speaker)
| Term | Estimate | p-value | Interpretation |
|---|---|---|---|
| (Intercept) | 1.088 | < .001 | Baseline F0 ratio for uninflected words with stress on s1 |
WordForminflected |
–0.031 | .078 | Inflected words show marginally lower F0 ratio (not quite significant) |
factor(root_stress)2 |
–0.042 | .017 | Stress on s2 lowers the s1:s2 F0 ratio |
WordForminflected:factor(root_stress)2 |
–0.024 | .321 | No significant interaction effect |
Summary
| Feature | Mean Duration | Mean Intensity | Mean F0 |
|---|---|---|---|
| Stress position effect | Significant (decrease in s1:s2 ratio) | Significant (decrease in s1:s2 ratio) | Significant (decrease in s1:s2 ratio) |
| Inflection effect | Not Significant | Significant (decrease in s1:s2 ratio) | Not Significant |
| **Interaction (Stress*WordForm)** | Not Significant | Not Significant | Not Significant |
## Plot B
cs_all_syll <- df_full_sample %>%
filter(Language == "CS") %>%
filter(WordForm == "inflected") %>%
filter(!is.na(SyllPos))
view(cs_all_syll)
# Plot with error bars (b) == duration of s1, s2, s3 by Stress and WordForm
# Summarize duration, intensity, and F0
summary_cs_all <- cs_all_syll %>%
group_by(StressedSyll, SyllPos) %>%
summarise(
mean_dur = mean(Duration_in_ms, na.rm = TRUE),
sd_dur = sd(Duration_in_ms, na.rm = TRUE),
mean_dB = mean(Mean_dB, na.rm = TRUE),
sd_dB = sd(Mean_dB, na.rm = TRUE),
mean_f0 = mean(MeanF0, na.rm = TRUE),
sd_f0 = sd(MeanF0, na.rm = TRUE),
n = n(),
.groups = "drop"
) %>%
mutate(
se_dur = sd_dur / sqrt(n),
se_dB = sd_dB / sqrt(n),
se_f0 = sd_f0 / sqrt(n)
)
# Neutral grey palette
#grey_palette <- c("s1" = "#999999", "s2" = "#666666", "s3" = "#333333")
# Shared minimalist theme (legend removed for first 2 plots)
shared_theme <- theme_minimal(base_size = 18) +
theme(
axis.title = element_text(size = 18),
axis.text = element_text(size = 16),
legend.title = element_blank(),
legend.text = element_text(size = 10),
legend.position = "none"
)
# Plot 1: Duration
cs_s1s2s3_plot <- ggplot(summary_cs_all, aes(x = factor(StressedSyll), y = mean_dur, fill = SyllPos)) +
geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
geom_errorbar(
aes(ymin = mean_dur - se_dur, ymax = mean_dur + se_dur),
position = position_dodge(width = 0.9),
width = 0.2
) +
# scale_fill_manual(values = grey_palette) +
labs(
x = "Root Stress",
y = "Duration (ms)"
) +
shared_theme
# Plot 2: Intensity
cs_s1s2s3_intensity_plot <- ggplot(summary_cs_all, aes(x = factor(StressedSyll), y = mean_dB, fill = SyllPos)) +
geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
geom_errorbar(
aes(ymin = mean_dB - se_dB, ymax = mean_dB + se_dB),
position = position_dodge(width = 0.9),
width = 0.2
) +
# scale_fill_manual(values = grey_palette) +
labs(
x = "Root Stress",
y = "Intensity (dB)"
) +
shared_theme
# Plot 3: F0 (with legend)
cs_s1s2s3_f0_plot <- ggplot(summary_cs_all, aes(x = factor(StressedSyll), y = mean_f0, fill = SyllPos)) +
geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
geom_errorbar(
aes(ymin = mean_f0 - se_f0, ymax = mean_f0 + se_f0),
position = position_dodge(width = 0.9),
width = 0.2
) +
# scale_fill_manual(values = grey_palette) +
labs(
x = "Root Stress",
y = "F0 (Hz)",
fill = "Syllable Position"
) +
shared_theme
# Combine horizontally with shared legend
cs_s1s2s3_plot_clean <- cs_s1s2s3_plot + theme(axis.title.x = element_blank())
cs_s1s2s3_f0_plot_clean <- cs_s1s2s3_f0_plot + theme(axis.title.x = element_blank())
cs_panel_horizontal <- (cs_s1s2s3_plot_clean | cs_s1s2s3_intensity_plot | cs_s1s2s3_f0_plot_clean) +
plot_layout(ncol = 3, guides = "collect") &
theme(legend.position = "bottom")
print(cs_panel_horizontal)
# Save output
ggsave("cs_s1s2s3_panel_horizontal.png", plot = cs_panel_horizontal, width = 15, height = 5, dpi = 400, bg = "white")
# Print for inspection
# Create vertical layout
# Duration plot (no x-axis label, no legend)
# cs_s1s2s3_plot_clean <- cs_s1s2s3_plot +
# labs(x = NULL) +
# theme(legend.position = "none")
# Intensity plot (no x-axis label, no legend)
# cs_s1s2s3_intensity_plot_clean <- cs_s1s2s3_intensity_plot +
# labs(x = NULL) +
# theme(legend.position = "none")
# F0 plot (with x-axis label and shared legend)
# cs_s1s2s3_f0_plot_clean <- cs_s1s2s3_f0_plot +
# labs(x = "Root Stress Position") +
# theme(legend.position = "bottom")
# Combine plots vertically
# cs_panel_vertical_clean <- cs_s1s2s3_plot_clean /
# cs_s1s2s3_intensity_plot_clean /
# cs_s1s2s3_f0_plot_clean +
# plot_layout(guides = "collect") &
# theme(legend.position = "bottom")
# Save cleaned panel
# ggsave("cs_s1s2s3_panel_vertical_clean.png", plot = cs_panel_vertical_clean,
# width = 7, height = 12, dpi = 400, bg = "white")
# Print for review
# print(cs_panel_vertical_clean)
# Test H2:Stress follows Kazakh rules
# s3 would have significantly longer duration than s1 and s2 if H3 is true.
# dataset contains durations of all s1,s2, s3
head(cs_all_syll)
# Initial model b for comparing s1,s2 and s3
# model_b <- lmer(Duration_in_ms ~ SyllPos*Stress + (1|Speaker), data=cs_all_syll)
# summary(model_b)
# reference level - s1 stressed vs s3
# reference level - s2 stressed vs s3
# comparing positional difference based on root stress
# new code below taking into account above comments:
## s1 vs s3
# Recode NA as "no_stress"
cs_all_syll$Stress <- as.character(cs_all_syll$Stress)
cs_all_syll$Stress[is.na(cs_all_syll$Stress)] <- "no_stress"
cs_all_syll$Stress <- factor(cs_all_syll$Stress, levels = c("stressed", "unstressed", "no_stress"))
# Filter only s1 and s3 rows
cs_s1_s3_new <- cs_all_syll %>%
filter(SyllPos %in% c("s1", "s3"))
# Identify words where s1 is stressed
words_with_stressed_s1 <- cs_s1_s3_new %>%
filter(SyllPos == "s1", Stress == "stressed") %>%
pull(Word) %>% unique()
# Keep s1 and s3 syllables only from those words
cs_s1_s3_stressed_new <- cs_s1_s3_new %>%
filter(Word %in% words_with_stressed_s1)
cs_s1_s3_stressed_new
# Fit the model (model B_1, stressed s1 vs. s3 no_stress)
model_s1_vs_s3_dur <- lmer(Duration_in_ms ~ factor(SyllPos) + (1 | Speaker), data = cs_s1_s3_stressed_new)
## boundary (singular) fit: see help('isSingular')
summary(model_s1_vs_s3_dur)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Duration_in_ms ~ factor(SyllPos) + (1 | Speaker)
## Data: cs_s1_s3_stressed_new
##
## REML criterion at convergence: 1845.7
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.2060 -0.7044 -0.0497 0.4628 3.5991
##
## Random effects:
## Groups Name Variance Std.Dev.
## Speaker (Intercept) 0 0.00
## Residual 4589 67.74
## Number of obs: 165, groups: Speaker, 4
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 222.8084 7.3910 163.0000 30.15 <2e-16 ***
## factor(SyllPos)s3 0.5325 10.5487 163.0000 0.05 0.96
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## fctr(SylP)3 -0.701
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
model_s1_vs_s3_int <- lmer(Mean_dB ~ factor(SyllPos) + (1 | Speaker), data = cs_s1_s3_stressed_new)
summary(model_s1_vs_s3_int)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Mean_dB ~ factor(SyllPos) + (1 | Speaker)
## Data: cs_s1_s3_stressed_new
##
## REML criterion at convergence: 969
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.6219 -0.4066 0.0381 0.6665 2.0886
##
## Random effects:
## Groups Name Variance Std.Dev.
## Speaker (Intercept) 3.058 1.749
## Residual 20.414 4.518
## Number of obs: 165, groups: Speaker, 4
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 70.1355 1.0037 3.8656 69.877 3.85e-07 ***
## factor(SyllPos)s3 -2.7805 0.7036 160.0103 -3.952 0.000116 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## fctr(SylP)3 -0.344
model_s1_vs_s3_f0 <- lmer(MeanF0 ~ factor(SyllPos) + (1 | Speaker), data = cs_s1_s3_stressed_new)
summary(model_s1_vs_s3_f0)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: MeanF0 ~ factor(SyllPos) + (1 | Speaker)
## Data: cs_s1_s3_stressed_new
##
## REML criterion at convergence: 1445.8
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.3310 -0.5438 -0.0116 0.7014 2.5875
##
## Random effects:
## Groups Name Variance Std.Dev.
## Speaker (Intercept) 3590.1 59.92
## Residual 353.1 18.79
## Number of obs: 165, groups: Speaker, 4
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 160.376 30.029 3.014 5.341 0.0127 *
## factor(SyllPos)s3 -2.833 2.927 160.000 -0.968 0.3344
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## fctr(SylP)3 -0.048
# Filter to keep only rows where Stress is "stressed" AND SyllPos is s2 and s3
# Filter only s1 and s3 rows
cs_s2_s3_new <- cs_all_syll %>%
filter(SyllPos %in% c("s2", "s3"))
# Identify words where s2 is stressed
words_with_stressed_s2 <- cs_s2_s3_new %>%
filter(SyllPos == "s2", Stress == "stressed") %>%
pull(Word) %>% unique()
# Keep s2 and s3 syllables only from those words
cs_s2_s3_stressed_new <- cs_s2_s3_new %>%
filter(Word %in% words_with_stressed_s2)
cs_s2_s3_stressed_new
# Fit the model (model B_2, stressed s2 vs. s3 no_stress)
model_s2_vs_s3_dur <- lmer(Duration_in_ms ~ factor(SyllPos) + (1 | Speaker), data = cs_s2_s3_stressed_new)
summary(model_s2_vs_s3_dur)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Duration_in_ms ~ factor(SyllPos) + (1 | Speaker)
## Data: cs_s2_s3_stressed_new
##
## REML criterion at convergence: 1856.3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.0340 -0.7360 -0.0397 0.6178 3.3429
##
## Random effects:
## Groups Name Variance Std.Dev.
## Speaker (Intercept) 466.2 21.59
## Residual 3860.2 62.13
## Number of obs: 168, groups: Speaker, 4
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 238.700 12.748 4.067 18.725 4.23e-05 ***
## factor(SyllPos)s3 2.131 9.587 163.000 0.222 0.824
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## fctr(SylP)3 -0.376
model_s2_vs_s3_int <- lmer(Mean_dB ~ factor(SyllPos) + (1 | Speaker), data = cs_s2_s3_stressed_new)
summary(model_s2_vs_s3_int)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Mean_dB ~ factor(SyllPos) + (1 | Speaker)
## Data: cs_s2_s3_stressed_new
##
## REML criterion at convergence: 902.6
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.0611 -0.5299 0.0471 0.6822 2.2831
##
## Random effects:
## Groups Name Variance Std.Dev.
## Speaker (Intercept) 3.536 1.881
## Residual 12.174 3.489
## Number of obs: 168, groups: Speaker, 4
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 70.0537 1.0144 3.4714 69.059 1.43e-06 ***
## factor(SyllPos)s3 -0.8140 0.5384 163.0000 -1.512 0.133
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## fctr(SylP)3 -0.265
model_s2_vs_s3_f0 <- lmer(MeanF0 ~ factor(SyllPos) + (1 | Speaker), data = cs_s2_s3_stressed_new)
summary(model_s2_vs_s3_f0)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: MeanF0 ~ factor(SyllPos) + (1 | Speaker)
## Data: cs_s2_s3_stressed_new
##
## REML criterion at convergence: 1328.8
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -6.5301 -0.4348 -0.0686 0.6563 2.5248
##
## Random effects:
## Groups Name Variance Std.Dev.
## Speaker (Intercept) 4183.4 64.68
## Residual 146.3 12.09
## Number of obs: 168, groups: Speaker, 4
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 154.460 32.367 3.005 4.772 0.0174 *
## factor(SyllPos)s3 9.618 1.866 163.000 5.154 7.31e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## fctr(SylP)3 -0.029
OBSERVATION: Model_B
Model B: Duration (s1_stressed vs s3)
Formula: Duration_in_ms ~ factor(SyllPos) + (1 | Speaker)
| Term | Estimate | p-value | Interpretation |
|---|---|---|---|
| (Intercept) | 222.81 | < .001 | Baseline duration for s1 syllables is approximately 223 ms |
factor(SyllPos)s3 |
0.53 | .960 | No significant duration difference; s3 duration is nearly identical to s1 |
Model B: Intensity (s1_stressed vs s3)
Formula: Mean_dB ~ factor(SyllPos) + (1 | Speaker)
| Term | Estimate | p-value | Interpretation |
|---|---|---|---|
| (Intercept) | 70.14 | < .001 | Baseline intensity for s1 syllables is approximately 70.1 dB |
factor(SyllPos)s3 |
–2.78 | < .001 | s3 syllables are significantly less intense than s1 by ~2.8 dB |
Model B: F0 (s1_stressed vs s3)
Formula: MeanF0 ~ factor(SyllPos) + (1 | Speaker)
| Term | Estimate | p-value | Interpretation |
|---|---|---|---|
| (Intercept) | 160.38 | .013 | Baseline F0 for s1 syllables is approximately 160 Hz |
factor(SyllPos)s3 |
–2.83 | .334 | s3 syllables show no significant F0 difference compared to s1 |
Summary
| Feature | Duration | Intensity | F0 |
|---|---|---|---|
| s3 effect | Not significant (p = .960) | Significant (p < .001) | Not significant (p = .334) |
| Estimate | +0.53 ms | –2.78 dB | –2.83 Hz |
| Interpretation | No change from s1 | s3 has lower intensity | No meaningful F0 difference |
Model B: Duration (s2_stressed vs s3)
Formula: Duration_in_ms ~ factor(SyllPos) + (1 | Speaker)
| Term | Estimate | p-value | Interpretation |
|---|---|---|---|
| (Intercept) | 238.70 | < .001 | Baseline duration for s2 syllables is approximately 239 ms |
factor(SyllPos)s3 |
2.13 | .824 | s3 duration is not significantly different from s2 |
Model B: Intensity (s2_stressed vs s3)
Formula: Mean_dB ~ factor(SyllPos) + (1 | Speaker)
| Term | Estimate | p-value | Interpretation |
|---|---|---|---|
| (Intercept) | 70.05 | < .001 | Baseline intensity for s2 syllables is approximately 70.1 dB |
factor(SyllPos)s3 |
–0.81 | .133 | s3 is ~0.8 dB less intense, but this difference is not significant |
Model B: F0 (s2_stressed vs s3)
Formula: MeanF0 ~ factor(SyllPos) + (1 | Speaker)
| Term | Estimate | p-value | Interpretation |
|---|---|---|---|
| (Intercept) | 154.46 | .017 | Baseline F0 for s2 syllables is approximately 154 Hz |
factor(SyllPos)s3 |
+9.62 | < .001 | s3 syllables show a significantly higher F0 (~9.6 Hz) compared to s2 |
Summary
| Feature | Duration | Intensity | F0 |
|---|---|---|---|
| s3 effect | Not significant (p = .824) | Not significant (p = .133) | Significant increase (p < .001) |
| Estimate | +2.13 ms | –0.81 dB | +9.62 Hz |
| Interpretation | No meaningful change | No meaningful change | s3 has notably higher pitch |
## Plot C == s3 difference in Kaz and CS tokens
kz_cs_df <- df_full_sample %>%
filter(Language %in% c("CS", "Kaz"),
SyllPos == 's3') %>%
filter(!is.na(SyllPos))
#view(kz_cs_df)
summary_kz_cs <- kz_cs_df %>%
group_by(Language) %>%
summarise(
mean_dur = mean(Duration_in_ms, na.rm = TRUE),
sd_dur = sd(Duration_in_ms, na.rm = TRUE),
mean_dB = mean(Mean_dB, na.rm = TRUE),
sd_dB = sd(Mean_dB, na.rm = TRUE),
mean_f0 = mean(MeanF0, na.rm = TRUE),
sd_f0 = sd(MeanF0, na.rm = TRUE),
n = n(),
.groups = "drop"
) %>%
mutate(
se_dur = sd_dur / sqrt(n),
se_dB = sd_dB / sqrt(n),
se_f0 = sd_f0 / sqrt(n)
)
# Shared color palette and theme
# fill_colors <- c("Kaz" = "#666666", "CS" = "#a6cee3")
base_theme <- theme_minimal(base_size = 16) +
theme(
axis.title = element_text(size = 16),
axis.text = element_text(size = 14),
legend.position = "none"
)
# Duration plot
p_dur <- ggplot(summary_kz_cs, aes(x = Language, y = mean_dur, fill = Language)) +
geom_bar(stat = "identity", position = position_dodge(0.4), width = 0.4) +
geom_errorbar(aes(ymin = mean_dur - se_dur, ymax = mean_dur + se_dur),
width = 0.2, position = position_dodge(0.4)) +
# scale_fill_manual(values = fill_colors) +
labs(y = "Mean Duration (ms)", x = NULL) +
base_theme
# Intensity plot
p_dB <- ggplot(summary_kz_cs, aes(x = Language, y = mean_dB, fill = Language)) +
geom_bar(stat = "identity", position = position_dodge(0.4), width = 0.4) +
geom_errorbar(aes(ymin = mean_dB - se_dB, ymax = mean_dB + se_dB),
width = 0.2, position = position_dodge(0.4)) +
# scale_fill_manual(values = fill_colors) +
labs(y = "Mean Intensity (dB)", x = NULL) +
base_theme
# F0 plot
p_f0 <- ggplot(summary_kz_cs, aes(x = Language, y = mean_f0, fill = Language)) +
geom_bar(stat = "identity", position = position_dodge(0.4), width = 0.4) +
geom_errorbar(aes(ymin = mean_f0 - se_f0, ymax = mean_f0 + se_f0),
width = 0.2, position = position_dodge(0.4)) +
# scale_fill_manual(values = fill_colors) +
labs(y = "Mean F0 (Hz)", x = NULL) +
base_theme
# Horizontal panel with A, B, C annotations
panel_horizontal <- p_dur + p_dB + p_f0 +
plot_layout(ncol = 3, guides = "collect") &
theme(legend.position = "bottom")
# plot_annotation(tag_levels = 'A')
panel_horizontal <- panel_horizontal + plot_annotation(
title = NULL,
subtitle = NULL,
caption = "Language"
)
print(panel_horizontal)
ggsave("kz_cs_panel_horizontal_tagged.png", panel_horizontal, width = 15, height = 5, dpi = 400, bg = "white")
# Vertical panel with A, B, C annotations
# panel_vertical <- p_dur / p_dB / p_f0 +
# plot_layout(ncol = 1, guides = "collect") +
# plot_annotation(tag_levels = 'A')
#
# ggsave("kz_cs_panel_vertical_tagged.png", panel_vertical, width = 6, height = 12, dpi = 400, bg = "white")
# Print both for preview
#print(panel_vertical )
# Test H3: A mix of Kazakh and Russian stress
# dataset contains duration of s3 only for Kaz and CS
# Duration of s3 by Language
head(kz_cs_df)
model_c_dur <- lmer(Duration_in_ms ~ factor(Language) + (1|Speaker), data=kz_cs_df)
summary(model_c_dur)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Duration_in_ms ~ factor(Language) + (1 | Speaker)
## Data: kz_cs_df
##
## REML criterion at convergence: 3365
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.6393 -0.6892 -0.0484 0.6634 3.4469
##
## Random effects:
## Groups Name Variance Std.Dev.
## Speaker (Intercept) 83.5 9.138
## Residual 2801.2 52.926
## Number of obs: 313, groups: Speaker, 4
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 243.438 6.231 5.077 39.066 1.71e-07 ***
## factor(Language)CS -10.117 5.983 308.003 -1.691 0.0919 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## fctr(Lng)CS -0.482
model_c_int <- lmer(Mean_dB ~ factor(Language) + (1|Speaker), data=kz_cs_df)
summary(model_c_int)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Mean_dB ~ factor(Language) + (1 | Speaker)
## Data: kz_cs_df
##
## REML criterion at convergence: 1748.6
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.5323 -0.4373 0.1642 0.6724 1.9683
##
## Random effects:
## Groups Name Variance Std.Dev.
## Speaker (Intercept) 3.858 1.964
## Residual 15.224 3.902
## Number of obs: 313, groups: Speaker, 4
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 68.6356 1.0306 3.2988 66.600 2.82e-06 ***
## factor(Language)CS -0.4229 0.4411 308.0027 -0.959 0.338
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## fctr(Lng)CS -0.215
model_c_f0 <- lmer(MeanF0 ~ factor(Language) + (1|Speaker), data=kz_cs_df)
summary(model_c_f0)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: MeanF0 ~ factor(Language) + (1 | Speaker)
## Data: kz_cs_df
##
## REML criterion at convergence: 2611.2
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -5.0077 -0.4203 0.0517 0.6083 2.3689
##
## Random effects:
## Groups Name Variance Std.Dev.
## Speaker (Intercept) 3820.0 61.81
## Residual 234.4 15.31
## Number of obs: 313, groups: Speaker, 4
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 168.295 30.928 3.005 5.442 0.0121 *
## factor(Language)CS -7.169 1.731 308.000 -4.142 4.44e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## fctr(Lng)CS -0.028
OBSERVATION:Model_C
Model C: Duration
Formula: Duration_in_ms ~ factor(Language) + (1 | Speaker)
| Term | Estimate | p-value | Interpretation |
|---|---|---|---|
| (Intercept) | 243.44 | < 0.001 | Baseline duration for Kazakh tokens: ~243 ms |
factor(Language)CS |
–10.12 | 0.092 | CS tokens are ~10 ms shorter, marginally significant |
Model C: Mean Intensity
Formula: Mean_dB ~ factor(Language) + (1 | Speaker)
| Term | Estimate | p-value | Interpretation |
|---|---|---|---|
| (Intercept) | 68.64 | < 0.001 | Baseline mean intensity for Kazakh tokens: ~68.6 dB |
factor(Language)CS |
–0.42 | 0.338 | CS tokens show slightly lower intensity, not significant |
Model C: Mean F0
Formula: MeanF0 ~ factor(Language) + (1 | Speaker)
| Term | Estimate | p-value | Interpretation |
|---|---|---|---|
| (Intercept) | 168.30 | 0.012 | Baseline mean F0 for Kazakh tokens: ~168 Hz |
factor(Language)CS |
–7.17 | < 0.001 | CS tokens show significantly lower F0 (~7 Hz drop) |
Summary
| Feature | Language Effect | Interpretation |
|---|---|---|
| Duration | Marginal | CS tokens trend shorter than Kazakh (~10 ms diff) |
| Mean Intensity | Not significant | No notable difference across languages |
| Mean F0 | Significant | CS tokens show a clear F0 drop (~7 Hz lower) |
cs_all_syll_shift <- df_full_sample %>%
filter(Language == "CS") %>%
filter(ShiftDirect %in% c("forward", "na")) %>%
mutate(ShiftDirect = recode(ShiftDirect,
"forward" = "mobile",
"na" = "fixed")) %>%
filter(!is.na(SyllPos))
# filter(StressShift == "no") %>%
# sample_n(size = 59)
# view(cs_all_syll)
# Plot with error bars (b) == duration of s1, s2, s3 by Stress and WordForm
# Summarize duration, intensity, and F0
summary_cs_all_shift <- cs_all_syll_shift %>%
group_by(StressedSyll, SyllPos,ShiftDirect) %>%
summarise(
mean_dur = mean(Duration_in_ms, na.rm = TRUE),
sd_dur = sd(Duration_in_ms, na.rm = TRUE),
mean_dB = mean(Mean_dB, na.rm = TRUE),
sd_dB = sd(Mean_dB, na.rm = TRUE),
mean_f0 = mean(MeanF0, na.rm = TRUE),
sd_f0 = sd(MeanF0, na.rm = TRUE),
n = n(),
.groups = "drop"
) %>%
mutate(
se_dur = sd_dur / sqrt(n),
se_dB = sd_dB / sqrt(n),
se_f0 = sd_f0 / sqrt(n)
)
# Neutral grey palette
#grey_palette <- c("s1" = "#999999", "s2" = "#666666", "s3" = "#333333")
# Shared minimalist theme (legend removed for first 2 plots)
shared_theme <- theme_minimal(base_size = 18) +
theme(
axis.title = element_text(size = 18),
axis.text = element_text(size = 16),
legend.title = element_blank(),
legend.text = element_text(size = 10),
legend.position = "none"
)
# Plot 1: Duration
cs_s1s2s3_plot <- ggplot(summary_cs_all_shift, aes(x = factor(StressedSyll), y = mean_dur, fill = SyllPos)) +
facet_wrap(~ShiftDirect) +
geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
geom_errorbar(
aes(ymin = mean_dur - se_dur, ymax = mean_dur + se_dur),
position = position_dodge(width = 0.9),
width = 0.2
) +
# scale_fill_manual(values = grey_palette) +
labs(
x = "Root Stress",
y = "Duration (ms)"
) +
shared_theme
print(cs_s1s2s3_plot)
# Plot 2: Intensity
cs_s1s2s3_intensity_plot <- ggplot(summary_cs_all_shift, aes(x = factor(StressedSyll), y = mean_dB, fill = SyllPos)) +
facet_wrap(~ShiftDirect) +
geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
geom_errorbar(
aes(ymin = mean_dB - se_dB, ymax = mean_dB + se_dB),
position = position_dodge(width = 0.9),
width = 0.2
) +
# scale_fill_manual(values = grey_palette) +
labs(
x = "Root Stress",
y = "Intensity (dB)"
) +
shared_theme
print(cs_s1s2s3_intensity_plot)
# Plot 3: F0 (with legend)
cs_s1s2s3_f0_plot <- ggplot(summary_cs_all_shift, aes(x = factor(StressedSyll), y = mean_f0, fill = SyllPos)) +
facet_wrap(~ShiftDirect) +
geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
geom_errorbar(
aes(ymin = mean_f0 - se_f0, ymax = mean_f0 + se_f0),
position = position_dodge(width = 0.9),
width = 0.2
) +
# scale_fill_manual(values = grey_palette) +
labs(
x = "Root Stress",
y = "F0 (Hz)",
fill = "Syllable Position"
) +
shared_theme
print(cs_s1s2s3_f0_plot)
view(cs_roots_balanced)
# The reference level 'fixed' stress and against which 'mobile' will be compared to.
# Convert ShiftDirect to a factor since there are two cat levels
cs_roots_balanced$WordForm <- factor(cs_roots_balanced$WordForm)
# Set ref level
cs_roots_balanced$WordForm <- relevel(cs_roots_balanced$WordForm, ref = "uninflected")
cs_roots_balanced_fixed <- cs_roots_balanced %>%
filter(ShiftDirect == "fixed")
cs_roots_balanced_mobile <- cs_roots_balanced %>%
filter(ShiftDirect == "mobile")
# Model predictions for fixed roots
# Out predictions is the stress remaints on the root
model_d_dur <- lmer(ratio_s1_s2_dur ~ WordForm*factor(root_stress) + (1|root) +(1|Speaker), data=cs_roots_balanced_fixed)
summary(model_d_dur)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: ratio_s1_s2_dur ~ WordForm * factor(root_stress) + (1 | root) +
## (1 | Speaker)
## Data: cs_roots_balanced_fixed
##
## REML criterion at convergence: 57.8
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.85222 -0.48912 -0.02762 0.38907 2.02330
##
## Random effects:
## Groups Name Variance Std.Dev.
## root (Intercept) 0.165655 0.40701
## Speaker (Intercept) 0.007166 0.08465
## Residual 0.046051 0.21459
## Number of obs: 59, groups: root, 35; Speaker, 4
##
## Fixed effects:
## Estimate Std. Error df t value
## (Intercept) 1.16942 0.14071 29.60293 8.311
## WordForminflected -0.01651 0.19095 30.38455 -0.086
## factor(root_stress)2 -0.59189 0.21506 29.53055 -2.752
## WordForminflected:factor(root_stress)2 0.06397 0.31146 29.37809 0.205
## Pr(>|t|)
## (Intercept) 3.13e-09 ***
## WordForminflected 0.932
## factor(root_stress)2 0.010 *
## WordForminflected:factor(root_stress)2 0.839
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) WrdFrm fc(_)2
## WrdFrmnflct -0.669
## fctr(rt_s)2 -0.597 0.435
## WrdFrm:(_)2 0.409 -0.610 -0.684
model_d_int <- lmer(ratio_mean_int ~ WordForm*factor(root_stress) + (1|root) +(1|Speaker), data=cs_roots_balanced_fixed)
summary(model_d_int)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: ratio_mean_int ~ WordForm * factor(root_stress) + (1 | root) +
## (1 | Speaker)
## Data: cs_roots_balanced_fixed
##
## REML criterion at convergence: -111.3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.57735 -0.53982 0.04363 0.34612 2.62401
##
## Random effects:
## Groups Name Variance Std.Dev.
## root (Intercept) 0.0049577 0.07041
## Speaker (Intercept) 0.0005786 0.02405
## Residual 0.0029225 0.05406
## Number of obs: 59, groups: root, 35; Speaker, 4
##
## Fixed effects:
## Estimate Std. Error df t value
## (Intercept) 1.027666 0.027914 23.553046 36.816
## WordForminflected -0.009407 0.036020 32.782154 -0.261
## factor(root_stress)2 0.025803 0.040324 31.097365 0.640
## WordForminflected:factor(root_stress)2 -0.002339 0.058338 30.841543 -0.040
## Pr(>|t|)
## (Intercept) <2e-16 ***
## WordForminflected 0.796
## factor(root_stress)2 0.527
## WordForminflected:factor(root_stress)2 0.968
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) WrdFrm fc(_)2
## WrdFrmnflct -0.629
## fctr(rt_s)2 -0.567 0.431
## WrdFrm:(_)2 0.386 -0.613 -0.679
model_d_f0 <- lmer(ratio_mean_f0 ~ WordForm*factor(root_stress) + (1|root) +(1|Speaker), data=cs_roots_balanced_fixed)
## boundary (singular) fit: see help('isSingular')
summary(model_d_f0)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: ratio_mean_f0 ~ WordForm * factor(root_stress) + (1 | root) +
## (1 | Speaker)
## Data: cs_roots_balanced_fixed
##
## REML criterion at convergence: -97.1
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.77533 -0.65592 -0.07482 0.65051 1.94617
##
## Random effects:
## Groups Name Variance Std.Dev.
## root (Intercept) 7.64e-12 2.764e-06
## Speaker (Intercept) 1.05e-03 3.240e-02
## Residual 7.78e-03 8.821e-02
## Number of obs: 59, groups: root, 35; Speaker, 4
##
## Fixed effects:
## Estimate Std. Error df t value
## (Intercept) 1.11553 0.02647 8.83079 42.150
## WordForminflected -0.03276 0.03092 52.37574 -1.059
## factor(root_stress)2 -0.07224 0.03231 54.36482 -2.236
## WordForminflected:factor(root_stress)2 -0.03662 0.04667 52.64520 -0.785
## Pr(>|t|)
## (Intercept) 1.73e-11 ***
## WordForminflected 0.2943
## factor(root_stress)2 0.0295 *
## WordForminflected:factor(root_stress)2 0.4362
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) WrdFrm fc(_)2
## WrdFrmnflct -0.526
## fctr(rt_s)2 -0.523 0.419
## WrdFrm:(_)2 0.347 -0.653 -0.663
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
# model_d_maxf0 <- lmer(ratio_max_fo ~ ShiftDirect*factor(root_stress) + (1|root) +(1|Speaker), data=cs_roots_balanced)
# summary(model_d_maxf0)
# Model predictions for mobile roots
# Out prediction is the stress shifts to the s3, therefore s1:s2 ratio should decrease
model_d_dur_m <- lmer(ratio_s1_s2_dur ~ WordForm*factor(root_stress) + (1|root) +(1|Speaker), data=cs_roots_balanced_mobile)
summary(model_d_dur_m)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: ratio_s1_s2_dur ~ WordForm * factor(root_stress) + (1 | root) +
## (1 | Speaker)
## Data: cs_roots_balanced_mobile
##
## REML criterion at convergence: 53.8
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.23976 -0.60094 -0.06697 0.31499 2.72281
##
## Random effects:
## Groups Name Variance Std.Dev.
## root (Intercept) 0.20966 0.4579
## Speaker (Intercept) 0.01750 0.1323
## Residual 0.07155 0.2675
## Number of obs: 59, groups: root, 16; Speaker, 4
##
## Fixed effects:
## Estimate Std. Error df t value
## (Intercept) 1.1007 0.1927 12.8666 5.712
## WordForminflected 0.1484 0.2574 11.0138 0.577
## factor(root_stress)2 -0.3173 0.5102 10.6822 -0.622
## WordForminflected:factor(root_stress)2 0.2070 0.7220 10.7057 0.287
## Pr(>|t|)
## (Intercept) 7.44e-05 ***
## WordForminflected 0.576
## factor(root_stress)2 0.547
## WordForminflected:factor(root_stress)2 0.780
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) WrdFrm fc(_)2
## WrdFrmnflct -0.661
## fctr(rt_s)2 -0.333 0.250
## WrdFrm:(_)2 0.236 -0.356 -0.707
model_d_int_m <- lmer(ratio_mean_int ~ WordForm*factor(root_stress) + (1|root) +(1|Speaker), data=cs_roots_balanced_mobile)
summary(model_d_int_m)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: ratio_mean_int ~ WordForm * factor(root_stress) + (1 | root) +
## (1 | Speaker)
## Data: cs_roots_balanced_mobile
##
## REML criterion at convergence: -153.4
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.66362 -0.56755 -0.04577 0.37121 2.70992
##
## Random effects:
## Groups Name Variance Std.Dev.
## root (Intercept) 0.0021722 0.04661
## Speaker (Intercept) 0.0001511 0.01229
## Residual 0.0020940 0.04576
## Number of obs: 59, groups: root, 16; Speaker, 4
##
## Fixed effects:
## Estimate Std. Error df t value
## (Intercept) 1.09386 0.02073 12.85485 52.761
## WordForminflected -0.06060 0.02827 12.41125 -2.143
## factor(root_stress)2 -0.05649 0.05557 11.81606 -1.017
## WordForminflected:factor(root_stress)2 -0.02180 0.07868 11.84631 -0.277
## Pr(>|t|)
## (Intercept) <2e-16 ***
## WordForminflected 0.0525 .
## factor(root_stress)2 0.3297
## WordForminflected:factor(root_stress)2 0.7865
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) WrdFrm fc(_)2
## WrdFrmnflct -0.669
## fctr(rt_s)2 -0.340 0.250
## WrdFrm:(_)2 0.240 -0.359 -0.706
model_d_f0_m <- lmer(ratio_mean_f0 ~ WordForm*factor(root_stress) + (1|root) +(1|Speaker), data=cs_roots_balanced_mobile)
## boundary (singular) fit: see help('isSingular')
summary(model_d_f0_m)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: ratio_mean_f0 ~ WordForm * factor(root_stress) + (1 | root) +
## (1 | Speaker)
## Data: cs_roots_balanced_mobile
##
## REML criterion at convergence: -60.2
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.7667 -0.6269 0.0946 0.4130 2.7693
##
## Random effects:
## Groups Name Variance Std.Dev.
## root (Intercept) 0.000000 0.00000
## Speaker (Intercept) 0.002064 0.04543
## Residual 0.015615 0.12496
## Number of obs: 59, groups: root, 16; Speaker, 4
##
## Fixed effects:
## Estimate Std. Error df t value
## (Intercept) 1.05788 0.03343 6.05827 31.645
## WordForminflected -0.03095 0.03504 52.12024 -0.883
## factor(root_stress)2 0.04436 0.06712 52.00717 0.661
## WordForminflected:factor(root_stress)2 -0.07679 0.09506 52.01201 -0.808
## Pr(>|t|)
## (Intercept) 5.84e-08 ***
## WordForminflected 0.381
## factor(root_stress)2 0.512
## WordForminflected:factor(root_stress)2 0.423
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) WrdFrm fc(_)2
## WrdFrmnflct -0.514
## fctr(rt_s)2 -0.268 0.256
## WrdFrm:(_)2 0.189 -0.369 -0.706
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
OBSERVATION: Model_D
model_predictions = read_csv("/Users/aidyn/Downloads/Fixed_Roots_Model_Predictions.csv")
## Rows: 12 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Model, Dataset, Formula, Term, Interpretation
## dbl (2): Estimate, p-value
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
model_predictions
model_predictions_mobile = read_csv("/Users/aidyn/Downloads/Mobile_Roots_Model_Predictions.csv")
## Rows: 12 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Model, Dataset, Formula, Term, Interpretation
## dbl (2): Estimate, p-value
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
model_predictions_mobile
Summary
| Feature | Fixed Roots | Mobile Roots |
|---|---|---|
| Duration | Root stress affects s1/s2 ratio. Inflection has no effect. | No significant effects. |
| Intensity | No significant effects. Trend toward reduction in inflected forms. | Inflection marginally lowers intensity (supports stress shift). |
| F0 | Root stress affects pitch (stress = 2 lowers F0). | No significant pitch effects. |
Fixed roots maintain stress on the root, and this is reflected in duration and pitch.
Mobile roots show some acoustic evidence of stress shifting, especially in intensity, though effects are weak and not consistent across measures.
Inflection alone is not a strong predictor of stress shift, but in mobile roots, it may serve as a cue for reduced root prominence.
# Test H4:Stress follows Russian rules
# s1 and s2 would have significantly longer duration than s3.
# dataset contains durations of all s1,s2, s3 for fixed and mobile roots
# model_cs_all_shift <- lmer(Duration_in_ms ~ ShiftDirect + SyllPos + Stress + (1|Speaker), data = cs_all_syll_shift)
# summary(model_cs_all_shift)
These results indicate that stress in CS nouns remains on the Russian root, while final syllables exhibit Kazakh-style lengthening, supporting a hybrid prosodic pattern. This outcome suggests that bilinguals represent and coordinate multiple phonological systems even at the word level.
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-apple-darwin20
## Running under: macOS Ventura 13.7.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/Los_Angeles
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] lmerTest_3.1-3 lme4_1.1-36 Matrix_1.7-1 patchwork_1.3.0
## [5] modelr_0.1.11 lubridate_1.9.3 forcats_1.0.0 stringr_1.5.1
## [9] dplyr_1.1.4 purrr_1.0.2 readr_2.1.5 tidyr_1.3.1
## [13] tibble_3.2.1 ggplot2_3.5.1 tidyverse_2.0.0
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.6 xfun_0.49 bslib_0.8.0
## [4] lattice_0.22-6 numDeriv_2016.8-1.1 tzdb_0.4.0
## [7] Rdpack_2.6.2 vctrs_0.6.5 tools_4.4.2
## [10] generics_0.1.3 parallel_4.4.2 fansi_1.0.6
## [13] pkgconfig_2.0.3 lifecycle_1.0.4 compiler_4.4.2
## [16] farver_2.1.2 textshaping_0.4.0 munsell_0.5.1
## [19] htmltools_0.5.8.1 sass_0.4.9 yaml_2.3.10
## [22] pillar_1.9.0 nloptr_2.1.1 crayon_1.5.3
## [25] jquerylib_0.1.4 MASS_7.3-61 cachem_1.1.0
## [28] reformulas_0.4.0 boot_1.3-31 nlme_3.1-166
## [31] tidyselect_1.2.1 digest_0.6.37 stringi_1.8.7
## [34] labeling_0.4.3 splines_4.4.2 fastmap_1.2.0
## [37] grid_4.4.2 colorspace_2.1-1 cli_3.6.3
## [40] magrittr_2.0.3 utf8_1.2.4 broom_1.0.7
## [43] withr_3.0.2 scales_1.3.0 backports_1.5.0
## [46] bit64_4.5.2 timechange_0.3.0 rmarkdown_2.29
## [49] bit_4.5.0 ragg_1.3.3 hms_1.1.3
## [52] evaluate_1.0.1 knitr_1.49 rbibutils_2.3
## [55] rlang_1.1.4 Rcpp_1.0.14 glue_1.8.0
## [58] rstudioapi_0.17.1 vroom_1.6.5 minqa_1.2.8
## [61] jsonlite_1.8.9 R6_2.5.1 systemfonts_1.1.0